Multiplicative Adjustment of Class Probability: Educating Naı̈ve Bayes

نویسندگان

Se June Hong

Jonathan Hosking

Ramesh Natarajan

چکیده

Starting from the Näıve Bayes model, we develop a new concept for aggregating items of evidence in classification problems. We show that in Näıve Bayes, each feature variable contributes a multiplicative adjustment factor to the estimated class probability. We next introduce a way of controlling the importance of the feature variables by raising each adjustment factor to a different power. The powers are chosen so as to maximize the accuracy of estimated class probabilities on the training data, and their optimal values are obtained by fitting a logistic regression model whose explanatory variables are constructed from the feature variables of the classification problem. This optimization accomplishes more than what feature selection does for Näıve Bayes. We call this new model family the Adjusted Probability Model (APM). We also define a regularized version, APMR. Experiments demonstrate that APMR is surprisingly effective. Assigning different degrees of importance to the feature variables seems to remove much of the näıveté from Näıve Bayes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interval Estimation Naı̈ve Bayes

Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with assumptions of conditional independence among features given the class, called naı̈ve Bayes, is competitive with state of the art classifiers. On this paper a new naive Bayes classifier called Interval Estimation naı̈ve Bayes is proposed. Interval Estimation naı̈ve Bayes performs on two phases. On the ...

متن کامل

Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation

In many pattern recognition/classification problem the true class conditional model and class probabilities are approximated for reasons of reducing complexity and/or of statistical estimation. The approximated classifier is expected to have worse performance, here measured by the probability of correct classification. We present an analysis valid in general, and easily computable formulas for ...

متن کامل

Learning Semi Naı̈ve Bayes Structures by Estimation of Distribution Algorithms

Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier called naı̈ve Bayes is competitive with state of the art classifiers. This simple approach stands from assumptions of conditional independence among features given the class. Improvements in accuracy of naı̈ve Bayes has been demonstrated by a number of approaches, collectively named semi naı̈ve Bayes classi...

متن کامل

Bayesian Models to Assess Risk of Corruption of Federal Management Units

This paper presents a data mining project that generated Bayesian models to assess risk of corruption of federal management units. With thousands of extracted features related to corruptibility, the data were processed using techniques like correlation analysis and variance per class. We also compared two different discretization methods: Minimum Description Length Principle (MDLP) and Class-At...

متن کامل